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METHODS AND COMPOSITIONS FOR PEPTIDE LIBRARIES 
DISPLAYED ON LIGHT-EMITTING SC AFFOLDS 
RELATED U.S. APPLICATION DATA 

Continuation-in-part of Ser. No. 08/812,994, filed March 4, 1997 ("Methods 
for Identifying Nucleic Acid Sequences Encoding Agents that Affect Cellular 
Phenotypes, Carl Alexander Kamb, and Mark A. Pontz, inventors), which is" a 
continuation-in-part of Ser. No. 08/800,664. Feb H, 1997. 

FIELD OF THE INVENTION 

The present invention relates to the field of molecular biology, and more 
particularly to genetic sequences encoding peptide display scaffolds capable of 
emitting light, and to peptide display libraries based on these scaffolds. 

BACKGROUND 

Proteins can bind to numerous chemical species, or ligands, including small 
organic molecules, nucleic acids, peptides, metal ions, and other proteins. Indeed, to 

1 5 carry out a biological function, a protein must interact with another entity. The 

capacity of amino acid polymers to participate in chemical interactions is one of the 
major reasons for their ascendancy in the biological world. Much as the AND gate is 
the basic component of binary computers, individual proteins and their cognate 
ligands are the fundamental mechanism upon which cells and organisms are built. 

20 One of the most significant areas of research and development in the 

pharmaceutical industry involves methods to better design or screen for ligands that 
interact specifically with defined protein targets. Discovery of such ligands is the 
engine that drives development of new pharmaceutical compounds. Typically, efforts 
to find ligands focus on small molecules, antibodies, peptides, or RNA and DNA 

25 aptainers. Depending on the particular application, such ligands may provide lead 
compounds for drug development or probes for further research into biological 
processes. 

A flurry of recent experiments has explored the utility of peptide binding 
assays for discovery of peptide-based ligands that bind specific protein targets in 
vitro. One of the most popular methods involves phage display, i.e., the presentation 
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of peptide sequences on the surface of phage particles (Cwirla S.E., Peters I:. A., et al. 
Proc Sail Sci USA 1990 Aug; 87(16):6378-6482 and Cortese R., Monaci P., et al. 
Curr Opin Biotechnol 1996 Dec;7(6):6 1 6-62 1 ). Filamentous phage such as Ml 3 and 
fl have been engineered to express and present foreign peptide sequences. Two 
5 different approaches have been of primary interest; both involve incorporation during 
phage particle assembly of chimeric coat proteins that include segments of f5reign 
sequence. The first involves the phage coat protein g P 3 which is normally present on 
the phage coat in only a few copies per virus. Sequences that might be toxic at higher 
concentration on the viral coat, including relatively large protein domains, can be 

10 presented effectively using gp3 fusions. The second approach involves gp8, which is 
the major coat protein present in thousands of copies per virus. g P 8 fusions have the 
advantage that they may reside on the virus in large amounts, thus increasing the 
avidity of the interaction between the virus and potential receptors. But as a 
consequence of this increased amount of fusion protein, the virus is more selective 

15 about which sequences can be displayed using gp8 (Makowski, L. Gene 1993 Jun 
15;128(I):5-1 1 >. 

Otner modes of surface display have also been considered. Larger, more 
complex viruses including lambda and T4 have been exploited for surface display 
(Mikawa Y.G., Maruyama I.N. et al. J Mot Biol 1996 Sep 1 3;262(1):2 1 -30 and 
Efimov V.P., Nepluev I.V., et al. Virus Genes 1995;10(2): 1 73-177). The basic 
approach is similar to that used tor filamentous phages: that is, viruses are assembled 
in bacterial host cells which incorporate chimeric coat or tail fiber proteins that bear 
the foreign sequences. In contrast to filamentous phages, however, these viruses 
assemble completely inside the cytoplasm and are released through cell lysis; thus, 
coat proteins are cytoplasmic proteins as opposed to membrane proteins, a feature that 
may increase the flexibility of the display mechanism. 

Bacterial cells have also been examined as vehicles for surface display The 
general approach is to use a membrane protein (e.g., OmpA in E. coti) to display 
protein or peptide epitopes in an accessible manner on the eel! surface (Georgiou G.. 
Stephens [XL., et al. Protein Png 1996 Leb;9(2):239-247). Even mammalian cells 
have been employed as vehicles for surface display. For example, membrane proteins 
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such as CU4 and CDS were first cloned by expression and ligand-based selection 
mammalian cells. (Maddon P. J.. Liftman D R . ct al. Cell 1985 Aug;42(l):92- 1 04 and 
I.ittman D.R., Thomas Y.. et al. Cell 1985 Feb;40(2):237-246). 

One of the most appealing aspects of surface or phage display is the ability to 
screen complex peptide libraries for rare sequences that bind selectively to defined 
protein targets. The combinatorial chemistry required to generate a diverse population 
" r ~' " ^.^.uuwuuc syninesis. furthermore, twenty amino acids with 

their wide spectrum of chemical properties (e.g., hydrophobicity, charge, acidity, and 
size) can create substantial chemical complexity, more so than, for example, 
nucleotides. However, like nucleotides, peptide libraries displayed on phage can be 
reproduced with relative ease The replication requires nucleic acid intermediates, but 
the advantages of amplification are the same; namely, the capacity for biochemical 
enrichment without substantial loss of staning material, and the ability to perform 
genetic experiments. 

Although surface display of peptides or proteins is useful for selecting Hgands 
in vitro; it is less appropriate for selections that involve intracellular processes. For 
this application, expression systems inside the cell must be employed. Intracellular 
ectopic expression of antibody libraries is one mode of expression (Sawyer C, 
Embleton J., et al. J Immunol Methods 1997 May 26;204(2): 193-203); a second 
involves expression of peptide libraries generated as fusions to cytoplasmic proteins 
such as thioredoxin and GAL4 from yeast (Colas P., Cohen B., et al. Nature 1996 Apr 
1 1;380(6574):548-550 and Fields S., Song O. Nature 1989 Jul 20;340(6230):245- 
246). 

Although for certain applications (e.g., construction of an interaction or 
proteome map), proteins or relatively large protein fragments are superior to peptides 
for display, for other applications, it is advantageous not to be constrained by natural 
protein sequences. To identify or devise novel proteinacious ligands and/or inhibitors 
ot specific targets, it may be simpler to generate and examine a chemically diverse 
library of relatively low molecular weight compounds based on peptides. In addition, 
peptide libraries can be used in genetic selections and screens to pinpoint peptide 
ligands that bind important intracellular targets, similar to selections employed in. 
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e.g.. the yeast two-hybrid system (Fields S., Song O. Nature 1989 Jul 
20;340(6230):245-246). 

Though a potentially powerful tool, intracellular display of peptide libraries by 
the methods mentioned above suffers from several limitations. First, it is often 
difficult to know what the expression level of specific peptides or peptide fusions is; 
in many cases, even an average measure of expression level is difficult to obfain. 
Second, the diversity of the library is not easily estimated. It may be, for example, 
that only a small subset of possible peptide sequences are presented efficiently by a 
particular expression system. Third, it is not always easy to follow the expression of 
peptides in particular cells; for example, to know whether or not a specific ceil is 
expressing a member of the library. Fourth, it is not generally possible to manipulate 
the library to alter its average properties once the library has been generated; for 
example, to isolate library sequences compatible with high expression. Fifth, effort 
to restrict conformational freedom (in order to promote higher binding energies), e.g.. 
by inserting the peptides into the interior of protein sequences may compound the 
problems discussed above. Such inserted libraries are likely to perturb the function 
and stability of the fusion partners in ways difficult to predict and measure. A method 
is therefore needed to overcome these limitations associated with peptide or protein 
fragment display libraries. 

20 SUMMARY 

The present invention overcomes the above-mentioned limitations by 
providing methods and compositions for peptides or protein fragments displayed on 
scaffolds and libraries of sequences encoding peptides or protein fragments displayed 
on scaffolds that permit the properties of the library to be easily and quantitatively 

25 monitored. The scaffold is a protein that is capable of emitting light. Thus, analysis 
of the expression of individual members of the library when they arc expressed in 
cells may be carried out using instruments that can analyze the emitted light, such as a 
tlow sorter (FACS), a spectrophotometer, a microtitre plate reader, a CCD, a 
fluorescence microscope, or other similar device. This permits screening of the 
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expression library in host cells on a cell-by-cell basis, and enrichment of the library 
tor sequences that have predetermined characteristics 

A genetic sequence encoding a peptide display scaffold is used to create the 
libraries of the present invention. This scaffold sequence comprises a first sequence 
that encodes a molecule capable of emitting light. The first sequence contains a site, 
the location of which allows a second sequence to be inserted at the sue while 
maintaining the ability of the 

- " 'j ^tuiiu sequences iu 

emit light. 

These and other features, aspects, and advantages of the present invention will 
become better understood with regard to the following description, appended claims, 
and accompanying drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Fig. 1 : Model of the backbone of GFP showing sites of aptamer insertion. 
Numbers 1-10 correspond to insertion sites in pVT22-pVT31, respectively. 
15 Fig. 2: Map of pVT21. 

Fig. 3: Mean fluorescence intensities nf cell populations harboring GFP 
scaffold candidates, and various contioi> 

Fig. 4: fluorescence intensii> sc.m pVT21, pVT27. and pVT27APT2. 
Bgd: pVT2 I -containing yeast, grown under -pressing conditions (dextrose). 

Fig. 5A: Mean fluorescence intensities of 10 sorted pVT27APT2 yeast clones 
(B1-B10). 

Fig. 5B: Western blot analysis of GFP-aptamers from 10 pVT27APT2 yeast 

clones. 

Fig. 6: Map of mammalian expression vector. 

Fig. 7: Fluorescence intensity scan of HS294T pl6 lad cells expressing either 
K-GFP alone (pVT334), or F-GFP bearing internal insertions of DNA encoding 15 
amino acid random peptides 

Fig. 8: Western blot analysis of HS294T pi 6 lac 1 clones expressing E-GFP 
variants, as follows: marker (lane 1); 25 ul control of non-infected cells (lane 2); 5 ul 
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of day-two pVT334 (lane 3); 25 ul of day-two internal library (lane 4); 5 liI of dav- 
eight P VT334 (lane 5): 25 ud of day-eight internal library (lane 6); 5 of dav-cight 
pV H34 (lane 7), 25 ul of day-eight internal library (lane 8): 3 ul of day-two pVT334 
(lane 9); 25 ul of day-two internal library (lane 10).. 

DEI AILED DESCRIPTION OF PREFERRED EMBODIMENTS 
Definitions 

The term "'scaffold' 1 refers to a protein that can be used to display amino acid 
sequences as part of a fusion protein or insertion involving the scaffold as a backbone. 

The term - protein domain" or "protein fragment" refers to a portion of a native 
protein typically generated by expression of gene or cDNA fragments. 

The term *'aptamer" refers to a polymeric molecule, typically composed of 
nucleotides or amino acids, capable of adopting specific conformations and 
interacting physically and/or chemically with other molecules. 

The term "FIT is fluorescence units Note FU are arbitrary measures of 
fluorescence and cannot be compared between experiments 

The terms genetic library" or "library" refer to a collection of DNA fragments 
that may range in size from a few base pairs to a million base pairs These fragments 
are contained as inserts in vectors capable of propagating in host cells that may be 
bacterial, archacbacterial, fungal, mammalian, insect, or plant cells. 

The term '-insert" in the context of a library refers to an individual DNA 
fragment that constitutes a single member or element of the library. 

The term "sub-library" refers to a portion of a genetic library that has been 
isolated or selected by application of a specific screening or selection procedure. 

The term ^vector" refers to a DNA or RNA sequence that is capable of 
propagating in particular host cells and can accommodate inserts of foreign nucleic 
acid. Typically, vectors can be manipulated in vitro to insert foreign nucleic acids and 
the vectors can be introduced into host cells such that the inserted nucleic acid is 
transiently or stably present in the host cells. 

The term "host cell 1 ' refers to a cell of prokaryotic, archacbacterial, or 
eukarvotic origin that can serve as a recipient for a vector that is introduced bv anv 
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one of several procedures The host cell often allows replication and segregation of 
the vector that resides within. In certain cases, however, replication and/or 
segregation are irrelevant; expression of vector or insert DNA is the objective. 
Typical bacterial host cells include E. col, and B suhtiUs; archaebacterial host cells 
5 include S. acidocaldanus and H salinarium; fungal host cells include S cerevmae 
and S. pombe; plant ceils include those isolated from A. thaliana, and Z maize- insect 

host cells include those i^nhit^rt fr„m n .< .. . . „ . 

"v&ypti, ana o>. fmgiperda; 

and mammalian cells include those isolated from human tissues and cancers including 
melanocyte (melanoma), colon (carcinoma), prostate (carcinoma), and brain (glioma. 
10 neuroblastoma, astrocytoma). 

The term -reporter" refers to a protein (and "reporter gene" to the gene that 
encodes it) that serves as a surrogate for expression of specific sequences in the 
genome, or that allows the activity of cis regulatory sequences to be monitored easily 
and, preferably, in a quantitative fashion. Reporters may be proteins capable of 
emitting light such as GFP (Chalfie M, Tu Y.. et al., Science 1994 Feb. 11; 263:802- 
805) or luciferase (Gould S.J., and Subramant S.. Anal Binchem. Nov. 15; 175: 5-15 
(1988)), or intracellular or cell surface proteins detectable by antibodies such as (.' D 2 0 
(Koh J.. Enders G.H., et al. Nature 1995 375:506-510). Alternatively, reporter genes 
can confer antibiotic resistance such as hygromycin or neomycin resistance (Santerre 
R.F., et al. , Gene 30; 147-156(1 984)) 

The terms "bright" and "dim" in the context of a cell sorter refer to the 
intensity levels of fluorescence (or other modes of light emission) exhibited by- 
particular cells. Bright cells have high intensity emission relative to the bulk 
population of cells; dim cells have low intensity emission relative to the bulk 
25 population. 

The term "perturbagen" refers to an agent that acts in a transdominant mode to 
interfere with specific biochemical processes in cells. In the context of the present 
invention, perturbagens are typically cither proteins, protein fragments, or peptides, 
although the term also encompasses nucleic acids and other organic molecules with 
30 similar properties. 
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The term ^transdominanf describes a type of interaction whereby the agent 
(most typically a perturbagen) is a diffusable substance that can bind its target in 
solution. Thus, a transdominant agent is dominant as opposed to recessive in a 
genetic sense, because it acts on gene products and not on alleles of genes. The 
effects of a perturbagen are visible in the presence of wild type alleles of its target. 

The term ^phenocopy" refers to a phenotypic state or appearance that mimics 
or resembles the state induced by mutation of a specific gene or genes. This state 
may, for example, be induced by expression of perturbagens within a particular host 
cell. 

The term "GFP" refers to a member of a family of naturally occurring 
fluorescent proteins, whose fluorescence is primarily in the green region of the 
spectrum. The term includes mutant forms of the protein with altered spectral 
properties. Some of these mutant forms are described in Cormack B.P., Valdivia 
R.H., and Falkow S., Gene 173: 33-38 ( 1996) and Ormo M, Crystal structure of the 
Aequorea victoria green fluorescent protein. Science 273 (5280): 1392-1395 (1996). 
The term also includes polypeptide analog, fra-ments or derivatives of polypeptides 
which differ from naturally-occurring i,i:iis K ihe identity or location of one or more 
amino acid residues, for example, de:etn.rv Mi^;;umon and addition analogs, which 
share some or all of the properties or the n.iiur^ly occurring forms. Wild type GFP 
absorbs maximally at 395 nm and emits it 5w nm. High levels of GFP expression 
have been obtained in cells ranging tmm yeast to human cells. It is a robust, all- 
purpose reporter, whose expression in the cytoplasm can be measured quantitatively 
using instruments such as the FACS The term also includes BFP, the coding 
sequence for which is described in Anderson M.T., Tjioe I.M., Lorincz M.C., Parks 
D.R., Herzenberg L.A., Nolan G Y . Herzenberg L.A., Proc. Natl Acad. Sci. (USA) 
93: 16, 8508-851 1 (1996). 

The term "constrained conformation" when used in reference to an amino acid 
sequence means a position in which the sequence is tethered at both ends (for 
example, to a protein) imposing significant restraints on the conformational flexibility 
of the amino acid sequence. Limiting the conformational flexibility of the ammo acid 
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sequence promotes higher binding energies between the sequence and potential 
binding partners increasing the efficiency of screening methods. 
A. Overview 

The present invention provides methods and compositions for constructing and 
using peptides or protein fragments displayed on scaffolds and libraries of sequences 
encoding peptides or protein fragments displayed on scaffolds. The methods employ 
as a scaffold a protein capable of emitting imht Thi* 

rigorous, quantitative analysis of the library, advantages that are either difficult or 
impossible to obtain in other settings. In a preferred embodiment, the scaffold used is 
an autofluorescent protein, e.g., the green fluorescent protein (GFP) from the jellyfish 
Aequorea victoria (Chalfie M., Tu Y., et al. Science 1994 Feb 1 1 ;263(5148):802- 
805). 

Sites on the scaffold protein that are appropriate for insertion of random 
peptide sequences are identified. Appropriate sites would accommodate peptide 
insertions without seriously disturbing protein function. Sites that not only accept 
small inserted sequences, but also accept a wide variety of different sequences are 
described. Such sites are by definition robust to chemical perturbation. Some 
proteins accommodate insertions at numerous sites throughout their primary sequence. 
Others are much less accommodating. It is difficult in general to predict which 
proteins are robust to insertions, and which sites in a particular protein are best suited 
to insertion of multiple independent sequences. However, in cases where three- 
dimensional structures are available, or where primary sequences of several members 
of a protein family can be examined, certain regions are more likely to accept 
insertions. Such regions include solvent exposed regions and regions of relatively 
high primary sequence variability. 

Autofluorescent proteins provide a ready assay for identification of appropriate 
insertion locations. Because the activity of the protein (and by inference its expression 
level) can be monitored quantitatively using a flow sorter, it is simple to assay many 
independent insertions either sequentially or in bulk population. The best candidates 
can then be screened for or selected from the population. Mutant proteins are 
generated by manipulating the DNA sequence, such that a variety of different 
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insertions are generated and examined by How cytometry to locate variants that retain 
autofluorescent properties. Variants identified in this fashion reveal the nature of sites 
within the protein best suited for display of foreign sequence. 

Once suitable insertion sites are discovered, it is possible to monitor 
quantitatively the characteristics (light emission in the case of an autofludrescent 
protein) of the individual scaffolds that are chosen. The flow sorter serves ajf an 
appropriate tool for such analysis. A family of peptides, preferably a relatively large 
family having from around 10 3 to 10 7 members) is inserted into the scaffolds at 

the predetermined position to generate an expression library, and the fluorescence 
properties of the library are examined. Quantitative parameters such as mean 
fluorescence intensity and variance can be determined from the fluorescence intensitv 
profile of the library population (Shapiro H. Practical Flow Cytometry 1995 217- 
228). This permits an estimate of the percentage of library sequences that do not lend 
themselves to expression in this context, and hence, an estimate of the library 
15 complexity. 

The flow sorter can be used not only as a screen to examine the properties of 
the generated expression libraries, but also as a tool to manipulate and bias the 
libraries in potentially useful ways. For example, in certain cases it may be heipfu' • 
select from the expression library those sequences that express the highest levels - ; 

20 protein in cells. Alternatively, it may be desirable simply to exclude all library 

constructs that do not express scaffold levels above the background; many of the ^ 
negative or "dim" cells may harbor expression constructs that produce truncated or 
misfolded proteins that are degraded or do not function as soluble peptide displav 
scaffolds (Dopf J., Horiagon T M. Gene 1996 173:39-44). The flow sorter permits 

25 such selections to be carried out with extraordinary efficiency because cells can be 
sorted at a rate often to one hundred million per hour (Shapiro II . Practical Flo v. 
Cytometry 1995 217-228). 

The libraries of sequences encoding peptides displayed on autofluorescent 
scattolds of the present invention provide the means to carry out genetic or 

30 pseudogenetic experiments of considerable interest. These experiments involve 

generation of. phenocopies of mutants by overexpression of peptide inhibitors in cell 
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Such experiments have been performed in specific contexts before (PCT US97 14^ 
14, Selection Systems for the Identification of Genes Based on Functional Analysis; 
U.S. Patent Application 08/812,994, Methods for Identifying Nucleic Acid Sequences 
Encoding Agents that Affect Cellular Phenotypes, filed March 4, 1997). 

Peptide-based ligands are useful in a variety of contexts as probes of biological 
functions, or as aids in the development of therapeutic compounds. A variety 7 of 
techniques have been developed to isolate specific peptides from complex libraries 
which bind to defined targets in vitro. In addition, the notion of using peptide 
libraries expressed in cells as agents to disrupt specific biochemical pathways has 
been explored recently (PCT US97 145 14, Selection Systems for the Identification of 
Genes Based on Functional Analysis). These agents are called "perturbagens" by 
analogy with mutagens that alter the genetic material. Pcrturbagens, rather than 
causing mutations in genes, achieve their effect by specifically binding targets in the 
cell, thereby perturbing particular biochemical processes. 

To enable such pseudo-genetic analysis, a display system that operates inside 
living cells is required. The protein scaffolds of the present invention provide such a 
display system. The protein scaffolds of the present invention are relatively resistant 
to degradation by proteases within the cell and display peptides in a constrained 
conformation. In addition, they are soluble— even when joined to a wide variety of 
20 foreign peptide sequences. They also allow the quantitative performance of the 

scaffold to be measured in terms of its ability to display peptides and maintain high 
levels of stability and expression in cells. 

B. Insertion Site Design 

An initial step in designing the display scaffold is determining the site (or 
25 sites) that accommodate foreign peptide sequences. In the case of GIT, it is likely 
that the molecule is highly sensitive to perturbations as dramatic as amino acid 
insertions due to the compact, spare nature of the structure (Ormo M., Cubitt ATT, et 
al. Science 1996 273:1392-1395). The recently-solved crystal structure of GFP 
reveals that this protein assumes a beta-barrel structure and has ten solvent-accessible 
30 loops, two of which connect the helical chromophore segment to the rest of the 
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protein (Ormo M, Cubitt A.B. et al. Science 1996 273: 1 392-1395). The remaining 8 
loops connect the beta-strands of the barrel to one another. These loops are candidate 
sues lor the insertion of random aptamers. By inserting aptamers into the beta-turns 
in GFP, loops can be identified by flow cytometry which accommodate random 
5 aptamers while allowing GFP to retain fluorescence. Although GFP is known to 

readily accept N- and C-terminal fusions, there are two reasons for preferring"interna] 
sites for peptide display. First, conformational freedom is reduced by tethering the 
two ends of the aptamer to rigid components of the structure; for aptamers located at 
the protein termini, it is only possible to tether one end (Ladner, R. Trends Biotechnol 

10 1995 13:426-430). Second, aptamers at either terminus will be charged, which limits 
the range of chemical/structural possibilities encompassed by the library. 

In the case of other autofluorescent proteins for which three-dimensional 
structural information is not available, it may be possible to exploit comparisons of 
gene family members. One historical approach to establishing the structural 

15 requirements of proteins is to compare amino acid sequences of proteins of similar 
function, within a single species and among different phyla Such comparisons may- 
shed light on the structurally important regions because these are the most likely to be 
conserved among family members. Sites tha: tolerate ammo acid changes without 
compromising protein function are the most likely to vary m sequence. 

20 An additional approach that is possible with autofluorescent proteins involves 

a blind "hit or miss" approach. The sequence of an autofluorescent protein may be 
deliberately varied such that, e.g., an insertion at every possible position is generated 
(Ausubel F.M., Brent R.. et al. % Current Protocols in Molecular Biology, John Wiley 
and Sons, New York (1996). Sambrook J., Fritsch E.F., and Maniatis, T., Molecular 

25 Cloning: A Laboratory Manual, Second Edition, CHSL Press, New York (1989)). 
These insertion mutants may be analyzed individually using a flow sorter after 
expression in cells, or the entire population may be analyzed in bulk, and the mutants 
that produce fluorescent protein at or above a predetermined threshold level in cells 
may be collected, separated from each other, and analyzed individuals afterward. 
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C. Genetic Libraries 

Once suitable scaffold candidates have been identified by the experiments 
described above, the candidates must be tested further to define the individual 
scaffolds that are capable of displaying a wide range of peptide sequences at the 
5 specified site(s). It is possible, for instance, that a site defined by experiments 
described above may only accept a very limited diversity of inserted sequences; 
alternatively, it is possible that the linker inserted above may represent an uppe; inim 
for the size of inserted sequences. Thus, introduction of an additional insert from the 
library may render the protein, e.g., unstable. Therefore, the capacity of the scaffold 

10 candidates to accept library inserts must be tested by introduction of a population of 

different inserts, and quantitation of the eff ects of the library sequences on the level of 
scaffold expression. 

The library may be generated in a variety of ways. The simplest way to create 
a large number of diverse sequences involves oligonucleotide synthesis. For example, 

15 a random oligonucleotide of length 24 encodes all possible peptides of length 8, a 

number that exceeds ten billion. A library of this size is so large that it is difficult to 
prepare. Libraries typically range in size from at least several thousand to about one 
hundred million individual species. Such libraries might involve all possible peptides 
of length 6, or might involve subsets of libraries composed of longer sequences. 

20 Libraries may also be generated from natural DNA sequences such as mRN A 

or genomic DNA. Typically such libraries would be biased toward native proteins 
and protein fragments. Thus, these libraries may contain a significant fraction of 
sequences that encode polypeptides that interact with native proteins in the cell. 
When such fragments are inserted into the auto fluorescent scaffold, they may fold into 

25 a conformation that resembles a domain from the cognate native protein from which 
they are derived (B artel P.L., Roecklein J. A., et al. Nat Genet 1996 Jan;12(l):72-77 i. 

DNA sequences generated as synthetic oligonucleotides or as cDNA or 
genomic DNA can be inserted into appropriate expression vectors in a variety of 
ways. Such methods for vector and insert preparation, ligation, and transformation 

30 are known in the art (Ausubel et al., supra). In general, it is necessary to produce a 
vector that has an appropriate restriction site for inserting foreign DNA into the 
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scaffold gene, to produce a linear vector such that the site is available for ligation, to 
mix the vector and library insert DNAs together under suitable reaction conditions, to 
permit the ligation to proceed for sufficient time, and to introduce the ligated material 
into a suitable host such as, e.g., E. coh such that individual clones (preferably a few 
5 million) can be selected for further experiments. 

D. Expression Vector 

The invention preferably employs an expression vector capable of producnm 
high levels of the peptide or protein fragment displayed on a scaffold protein. As 
discussed above, it is often difficult to determine the quality (i.e., diversity and 
10 expression levels) inside cells of a library of sequences encoding a peptide/scaffold 
combination. In the case of autofluorescent proteins, however, it is relatively easy >, 
determine the quantitative characteristics of the library. A flow sorter or similar 
device provides rapid quantitative information about the expression level of the 
library within living cells (Shapiro H. Practical Flow Cytometry 1995). 
1 5 The choice of promoter used to drive expression of the autofluorescent 

scaffold protein depends on which cells are to be examined. In most organisms ..u.d 
cell types that are used in biological or medical experiments, numerous promoter 
types are available. In general, strong promoters are preferred, because they will 
facilitate higher expression levels of library sequences in the chosen host cells. Such 
20 promoters are typically derived from housekeeping genes that are expressed at hm:i 
levels in most or all cell types in the organism, or from viruses. Numerous such cis 
regulatory sequences are known in the art, suitable for driving expression in 
mammalian cells, insect cells, plant cells, fungi or bacteria (Ausubel et al., 1 996; 
vector database located at http://wvvvv.atcg.com/vectordb/). For example, in 
eukaryotes the promoter for beta actin is useful (Qin Z., Kruger-ICrasagakes S., et al. 
J Exp Med. 1 78:^55-160); m plants the Cauliflower Mosaic Virus 35S promoter 
(Goddijn O.J., Pennmgs F.J., et al , Transgenic Res 1995 4:315-323) In mammalian 
cells, the cytomegalovirus (CMVj promoter is commonly used; and in general, a 
promoter that drives high level expression of, e.g., a housekeeping or viral gene can 
be identified with relative case using current molecular genetic methods. 
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E. Nucleic Acid Transfer 

During the last two decades several basic methods have evolved for 
transferring exogenous nucleic acid into host cells. These methods are well-known in 
the art (Ausubel F., Brent R. et al. infra; Sambrook J., Fritsch E.F., and Mamatis T., 
supra). For cells that are grown in tissue culture (e.g., mammalian, plant, and insect 
cells), numerous methods for nucleic acid transfer are also available. Some methods 

give rise primarilv to transient f*Yr\rf*w\r>n in K^ct ^«iin- ; *u„ • ... 

— --v w* 4J , i.w., wic cApicasion is gradually 

lost from the cell population. Other methods can also generate cells that stably 

express the transferred nucleic acid, though the percentage of stable expressers is 

typically lower than transient expressers Such methods include viral and non-viral 

mechanisms for nucleic acid transfer. 

In the case of viral transfer, a viral vector is used to carry nucleic acid inserts 
into the host cell. Depending on the specific virus type, the introduced nucleic acid 
may remain as an extrachromosomal element (e.g., adenoviruses, Amalfitano A., 
Regy C.R., and Chamberlain J.S.; Proc. AW Acad. Sci. USA 1996 93:3352-3356) or 
may be incorporated into a host chromosome (e g , retroviruses, Iida A., Chen S.T., et 
al../. Virol 1996 70:6054-605°). 

In the case of non-viral nucleic a. id r r inkier, many methods are available 
(Ausubel F., Brent R. et al. 1996) One technique for nucleic acid transfer is CaP0 4 
coprecipitation of nucleic acid This method relies on the ability of nucleic acid to 
coprecipitate with calcium and phosphate ions into a relatively insoluble CaP0 4 grit, 
which settles onto the surface of adherent cells on the culture dish bottom. The 
precipitate is, for reasons that are not clearly understood, absorbed by some cells and 
the coprecipitated DNA is liberated inside the cell and expressed. A second class of 
methods employs lipophilic cations that arc able to bind DNA by charge interactions 
while forming lipid micelles. These micelles can fuse with cell membranes, 
delivering their DNA cargo into the host cell where it is expressed. A third method of 
nucleic acid transfer is electroporation, a technique that involves discharge of voltage 
from the plates of a capacitor through a solution containing DNA and host cells. This 
process disturbs the bilayer sufficiently that DNA contained in the bathing solution is 
able to penetrate the cell membrane. 
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Several of these methods often involve the transfer of multiple DNA 
fragments into individual cells. It is often difficult to limit the quantity of DNA taken 
up by a single cell to one fragment. However, by using "carrier" nucleic acid (e.g.. 
DNA such as herring sperm DNA that contains no sequences relevant to the 
experiment), or reducing the total amount of DNA applied to the host cells, the 
problem of multiple fragment entry can be reduced. In addition, the invention does 
not specifically require that each recipient cell have a single type of library sequence. 
Multiple passages of the library through the host cells (see below), permit sequences 
of interest to be separated ultimately from sequences that may be present initially as 
bystanders. Moreover, the presence of multiple independent vector/insert constructs 
in a cell may be an advantage in certain cases because it allows more library inserts to 
be screened in a single experiment. 

For microbial cells such as bacteria and fungi, general methods such as 
electroporation work very well. In addition, methods have been customized to 
specific organisms-many of which involve pretreatment of the cells with salts (e.g., 
LiOAC for S. cercv.siae, CaCh or RbCl? for E. coli). These methods are known in 
the an (Ausubel et al., 1996: Sambrook et al.. 1989). 

F. Screen By Flow Sorter 

An important benefit of the present invention involves the ability to quantify 
the characteristics of a library that is generated in an autofluorescent protein scaffold. 
I o do this, a flow sorter or similar device may be used, as such devices are capable of 
rapidly examining a large number of individual cells that contain library inserts (e.g., 
10-100 million cells per hour) (Shapiro H. Practical Flow Cytometry 1995). 

Fluorescence measurements of the library expressed in particular host cells 
preferably involve comparisons with controls; for example, host cells that lack the 
expression construct (negative controls), and host cells that express the 
autofluorescent protein using the same expression vector in which the library is 
constructed, but without any inserted sequence in the autofluorescent protein (positive 
controls). These controls set limits on both the low (background) fluorescence end of 
the spectrum, and the high end. From these initial measurements, mean levels of 
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fluorescence can be determined, as well as a rough gauge of the variance of the 
distribution Tor instance, the wild type auto fluorescent protein may be expressed 
such that a mean fluorescence intensity of lOOOx is attained in the specific expression 
vector and host cells used in the experiment; the host cells without the expression 
5 vector may have a mean (background) fluorescence intensity of x. The scaffold that 
contains a linker appropriate for insertion may have a mean intensity that is lOOx, and 
the scaffold plus library may have a mean intensity that is 25a. hi addition, the 
standard deviation of the library fluorescence intensity distribution may be roughly + -/- 
20x. 

10 It may be desirable also to compare mean fluorescence levels with 

biochemically determined levels of autofluorescent protein with and without inserted 
foreign sequence(s). f or example, a western blot comprising lanes with various 
dilutions of purified (or at least known amounts of) autofluorescent protein (e.tj., 
GFP) may be run beside a lane prepared from a cell lysate of host cells that harbor the 

!5 expressed library to provide a biochemical estimate of autofluorescent expression 
levels in host cells. A monoclonal antibody directed against an epitope that is 
preserved in the scaffold protein can be used to bind the protein present on the blot 
and can be indirectly visualized by an appropriately labeled second antibody 
according to methods known in the art (Ausubel et al., 1996; Sambrook et al., 1989) 

20 This allows correlation of mean fluorescence intensity values with the mass of the 
scafiold protein in cells. From such experiments, the approximate cytoplasmic 
concentration of library sequences expressed in cells may be calculated. This in turn 
may permit estimation of the dissociation/inhibition constants that are most likely to 
apply to perturbagen/target interactions within the ceil (see below). 

25 The procedures for quantitation and screening described above can be applied 

both to the preparation of scaffold candidates, and to the generation of insertionai 
libraries using the scaffold candidates as insertion or fusion partners. Thus, scaffold 
proteins that contain linkers inserted at defined or random positions can be tested for 
fluorescence properties. The scaffolds that exhibit good quantitative behavior (c ^ 

30 consistent, robust expression in a variety of different host cells) according to the flow 
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cytometry readouts can be further examined after a library of sequences has been 
inserted into the linker site. 

These quantitative measurements provide useful information about the 
expression library. The measurements permit estimates of library diversity (defined 
5 here as the fraction of individual inserts that express significant levels of scaffold 
protein multiplied by the total number of independent clones in the library), - 
qualitative assessment of the robustness of particular scaffold proteins, and evaluation 
of the relative and absolute levels of scaffold expression in a bulk population of cells 
and in individual cells. 

10 G. Selection By Flow Sorter 

The How sorter has the ability not only to measure fluorescence signals in cells 
at a rapid rate, but also to collect cells that have specified fluorescence properties. 
This feature may be employed in a preferred embodiment of the invention to 
the initial library population for sequences that have predetermined characteristics. 

15 For example, a library created by insertion of a set of oligonucleotides of random 

sequence into the autofluorescent protein codm- sequence will include a percent ■ f 
sequences that contain termination codons 1 h- percentage can be minimized by 
biasing the library inserts against at. A n the third position of a codon to 

reduce the incidence of termination codon. u, the inserts. In all likelihood, however. 

20 some sequences with termination cojons \m 1 1 be present in the library. Expression of 
such sequences within cells will resul: in truncated scaffold proteins that likely are no 
longer fluorescent. In addition, there may be other library sequences that for different 
reasons do not produce fluorescent proteins inside cells; for instance, the scaffold 
protein plus insert may fold incorrectly or may be digested rapidly by proteases within 

25 the cell. These library sequences that result in non-fluorescent protein may be easily 
eliminated from the library set by collecting cells on the cell sorter which express 
levels of fluorescence above a predetermined threshold criterion. Such a selection 
procedure improves the quality of the library by removing those members that are 
most likely not to produce functional proteins. Typically libraries of more than a few 

30 million clones are difficult to construct and screen in vivo. Thus, in some cases a 
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premium may be placed on ensuring that the maximum number of library sequences 
express stable proteins. The selection experiments can be performed in a variety of 
host cells such as yeast, bacteria, plant, insect, or mammalian cells depending on the 
requirements of the experiment and the capabilities of the expression vectors being 
used. 

In certain cases it may be desirable to enrich the library for sequences that are 
compatible with very high levels of expression of the Scaffold protein, it is possible, 
even likely, that expression of a diverse set of sequences carried in a scaffold protein 
will generate a wide range of expression levels in cells due to different stabilities, 
folding tendencies, etc. This can be visualized on the flow sorter as a broadening of 
the distribution of fluorescence intensities. The distribution may range from 
background to the mean expression of the wild type autofluorescent protein expressed 
under the same conditions as the library, and beyond. To bias the library toward 
sequences compatible with the highest levels of protein expression, cells may be 
collected on the flow sorter that fall near the extreme right ("bright") end of the 
fluorescence intensity distribution. This process can be repeated in order to further 
skew the library population toward those that are expressed at the highest levels in the 
host cells. Such a procedure may be useful, if for example, the genetic experiments 
described below rely on expression of perturbagen molecules in cells at very high 
20 levels. The enrichment of the library may be achieved by examination of library- 
containing cells of different types (e.g , yeast, bacteria, plant, insect, or mammalian) 
depending on the objective of a particular experiment. 

H. Pcptide/Protein Fragment Display as Ferturbagens 

Perturbagens as defined supra behave in a transdominant mode to interfere 
25 with native functions of cellular components in vivo For the purposes of the present 
invention, perturbagens take the form of proteins, protein fragments, and peptides ( as 
disclosed in co-owned Ser. No. 08/812,994, Methods for Identifying Nucleic Acid 
Sequences Encoding Agents that Affect Cellular Phenotypes, filed March 4, 1997). 
Perturbagens have the advantage that, when overexpressed, they can produce a mutant 
30 phenocopy by inhibiting the products of both allelic gene copies in cells. In this 
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manner, they overcome one limitation of conventional genetic analysis in diploid 
cells; namely, the difficulty of isolating recessive mutants. Furthermore, DNA 
sequences that encode perturbagens are easily recovered from cells by, e.g., PCR. In 
addition, the target of the perturbagen in vivo can be readily identified using the 
perturbagen itself as a probe. Biochemical methods of purification or, preferably, 
yeast two-hybrid analysis provide convenient tools to elucidate perturbagea'tajget 
interactions. Unlike mutations induced within genes that reside on chromosomes, it i 
relatively straightforward to identify the target of the perturbation, and hence, the 
mechanism that underlies the phenocopy trait. 

As described above, insertional fusions that involve autotluorescent proteins 
have numerous advantages as display scaffolds for peptides or protein fragments. 
These proteins permit careful, rigorous measurement of the quantitative charactenst 
of perturbagen libraries prepared with them. Manipulation of the perturbagen librarv 
to enrich for sequences compatible with high expression levels and cell-by-cell 
1 5 monitoring of perturbagen expression are readily achieved. One of the most 

significant uses of the method disclosed herein involves the use of autotluorescent 
proteins as scaffolds that can present perturbagens in vivo. These perturbagen 
libraries provide, in essence, the means for genetic analysis that can be applied in 
virtually all cells-as long as they can be cultured and exogenous nucleic acid can K- 
20 expressed within them. 

EXAMPLE 1 

Construction of Peptide Display Libraries in the Interior of GFP 

An attractive strategy for the presentation of aptamers in cells involves the 
insertion of aptamers into a protein scaffold such that upon expression the aptamers 
25 are exposed on the surface of the scaffold. Immunoglobulins (Igs) provide a useful 
analogy for this type of approach. The tertiary structure of the variable domain of an 
Ig subunit is composed of a beta-barrel together with three exposed Foops which form 
hypervanable regions. These loops comprise antigen binding sites and can 
accommodate a vast number of different sequences. Presumably, the rigidity and 
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stability of the beta-barrel structure facilitates the presentation of exposed loops such 
that the variable peptide sequences assume unique, stable conformations The 
recently-solved crystal structure of GFP reveals that this protein also assumes a beta- 
barrel structure and has a number of solvent-exposed loops (Ormo et al., 1996). These 
loops are candidate sites for the insertion of random aptamers. By inserting aptamers 
into a number of the loops m GFP, it is possible to identify "ideal" loops which can 

accommodate nnH 

- ' " "F^'iwi vMiiic auuvving (jff to retain its 

autofluorescent properties. 

Preparation and Testing of GFP Yeast Scaffold Candidates 

pVT21. which permits induction of GFP expression in the presence of 
galactose, was obtained by manipulation of pACA15 1, a 6.7Kb 2u yeast shuttle vector 
which contains markers for URA3 and ampicillin resistance. In addit.on it contains a 
GFP expression cassette made up of the GAL 1 . 1 0 promoter, the coding region of a 
red-shifted ( S65T) GFP gene, and the phosphoglycerate kinase (PGK1) 3' end. To 
construct pVT21, the EcoRI site in pACA 151 was converted into a Bglll site. In 
addition, the PGK1 V end fragment of pACA151 was replaced with a 700 bp 
fragment (containing Narl and Bglll ends) which contained the PGK1 3' end with 
termination codons in three reading frames. 

Using the crystal structure of GFP as a guide, ten positions on the protein 
which fall within exposed loops were chosen as potential aptamer insertion sites. Fig 
1 Into the corresponding regions of the GFP gene, recognition sequences for BamHI, 
EcoRI and Xhol restriction endonucleases were introduced yielding plasmids pV 112- 
pVT31. Table 1. pVT2 1 was used as the parent vector for pVT22-pVT3 1 . In order 
to construct pVT22, pVT21 was used as a template in two separate PGR reactions 
using primer pairs OVT329, OVT307, and OVT330 and OVT317. The termini of the 
resulting fragments contained XhoI-EcoRI and BamHI-EcoRI restriction sites, 
respectively. These two fragments were digested with EcoRI (NEB), ligated using T4 
DNA ligase (Bochnnger Manheim). and PGR amplified using primers OVT329 and 
OVT330. The resulting 2Kb fragment contained the GAL1 UAS and PGK1 3' UTR, 
as well as a GFP gene with a 6-codon insert corresponding to XhoI-EcoRI-BamHI 
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recognition sequences. P VT22 was obtained by digesting this 2 Kb fragment with 
PstI and HindHI and inserting it into the pVT2I backbone (also digested with PstI and 
Hindlll). pVT23-pVT3 1 were constructed using an identical cloning strategy except 
that, instead of OVT307 and OVT3 17 the following primers were used: pV 123 
(OVT308, OVT318), P VT24 (OVT309, OVT319), pVT25 (OVT310, OVT320), 
P VT26 (OVT3 1 1 , OVT32 1 ), pVT27 (OVT3 1 2, OVT322), P VT28 (O V T3 13,- 
OVT323), P VT29 (OVT314, OVT324), P VT30 (OVT315, OVT325), P VT3 1 
(OVT316,OVT326> Table 2 



Construct 


Insertion Site 


pVT22 


Thr49-Thr50 


pVT23 


Met78-Lys79 


pVT24 


Glyll6-AspU7 


pVT25 


Lysl40-Leul41 


pVT26 


Glyl34-Asnl35 


pVT27 


Glnl57-Lysl58 


pVT28 


GIul72-Aspl73 


pVT2 c > 


Leul94-Leul95 


pVT30 


Giyl89-Aspl90 


pVT31 


Glu213-Lys214 



Table 1 : Sites of insertion within the GFP gene of p VT22 -pVT3 1 of an 

18 nucleotide fragment coding for the hexapeptide Leu-Glu-Glu-Phe-Gly-Ser. Amino 
acids numbering is according to the wild type GFP gene. 

This yielded ten GFP constructs, each of which contained six additional 
codons that included the restriction sites. These constructs were grown in E. coli and 
introduced into the yeast expression vector pVT21. Fig. 2. Yeast transformations 
were performed using the lithium acetate method (Gietz, R. and Schicstl, R. 1995 
Methods in Molecular and Cellular Biology 5:255-269), and transformations were 
selected and maintained on standard synthetic medium lacking uracil. 

The resulting transformed yeast were grown under inducing conditions (i.e., 
galactose-containing media) to drive expression of the GFP hybrid proteins and 
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analyzed by flow sorter to gauge the levels of GFP fluorescence. Fig. 3 and Table 3. 
Of the ten scaffold candidate constructs examined, the GFP constructs winch retained 
maximal fluorescence (pVT27, pVT28, and pVT29) were chosen as candidates to 
insert aptamers within the Xhol and BamHI restriction sites. 



Primer 



OVT309 



O V T7 1 0 



OVT311 



Nucleotide Sequence 



TGA GAATTCCTCGAG ACrTTr A A aptth A.CTTCAGC 



1 ^>r\ Kj^j\i itLitUAU i CCA i CT TCI' ITAAAATCAATAC 



TGAGAATTCCTCGAGTTTC.Tr.Trr a at;a a T GTTTCCATC 



OVT312 



OVT313 



OVT314 



OVT315 



OVT316 



OVT3 1 7 



OVT318 



OVT3 I 1 ) 



OVT320 



OVT321 



OVT322 



OVT323 



OVT324 



OVT325 



OVT326 



OVT329 



OVT330 



APT1: 



APT2: 



TGAG AATTCCTCG AGTTGTTTGTr Tr.r r a, TG A.TGTATAC 



TGA GAATTCCTCGAG TTCAATGTTGTr.TrTA 4,TTTGAAG 



TGAGAATTCCTCGA GGrr A A TTC( ; a r : r~ a TTTTGTTG AT 



TGA G AATTCCTC GAG A A GG a r a r,nr.rr a t^GCC 



TGAGAATTCCTCGAGTTrr,TTr,r.r.ATrTT T r :GAAAG 



TGAGAATTCGGATrr ACTdGA a a acta r^ TGTTGC a tgg 



TGAGAATTCGGArfJCAAACGGCATGACTTTTCAAGAG 



TGA GAATTCGGATCC GATACCCTTGTTA atat.a a _TCG~ 



TGAGAATT^GGAJCCAACATTCTTGGACACAAAITGG 



TGAGAATTCGG A TC rTTGG a atac a actata, a CTCACAC 



TGAGAATTCGGATf^AAGAATGGAATCAAAGTTAACTTC 



TGAGAAJTCGGATCCGATGGAAGCGTTCAACTAGC 



TGA GAATTCGGATrr GATGGrrrTr.TccriTT a rr 



TGAGAATJTCGGATCCTTACCAGACAACCATTACCTG 



TGA GAATTCGGATCC A AG AG AG A CC AC a Tr.r;rrr 



GTTAGCTCACTCATTAGGCACCC 



CGGTATAGATCTG L A 1 AG 1TCATCCATGCCATGTG 



GGCCTAGGATCC 



TGA CTCGAG (N r N(G/C/T))- in GGATCCT AGGCC . 



Table 2: 



Oligonucleotides. Restriction sites are underlined. 
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GFP 

CONSTRUCT 


Fluorescence > IX Bgd.I 
% Total Mean (FU 2 ) 
Population 


Fluorescence > 10X Bgd. 
% Total Mean (FU) 
Population 


pVT21 (Dex.) 


1 


3 


0 




pVT21 


96 


1545 


95 


1565 


pVT27 


89 


378 


81 


. 414 


pVT27 APT 


39 


41 


15 


99 


pVT28 


86 


428 


78 


471 


pVT28 .APT 


42 


28 


13 


78 


pVT29 


77 


71 


59 


90 


pVT29 APT 


32 


7 


2 


37 



Table 3: Mean fluorescence intensities of cell populations harboring 

PVT27APT. P VT28APT\ pVT29APT and parent constructs. Fluorescence gates were 
set either at background (Bgd ), or at a value ten-fold higher than background (10X 
Bgd.) Background is defined as the minimum fluorescence intensity value which is 
larger than the fluorescence value of 99% of non-induced cells. 

Preparation of Peptide Display Libraries 

DNA oligonucleotides coding for random 20 amino acid aptamers were 
synthesized and inserted into the Xhol and BamHl sites of the three selected GFP 
constructs mentioned above. I pmole of APT 1 (Table 2) was annealed to 1 pmole 
APT2 (Table 2) and the second strand was synthesized using Klenow fragment 
(Promega, Madison WT). The resulting double stranded aptamers consisted of 
BamHl and Xhol sites flanking 60 bases of biased random sequence. The GFP- 
aptamcr libraries in each of the three scaffold candidates were created by digesting the 
aptamers with BamHl and Xhol, inserting them into BamHI/XhoI cut vector (either 
pV I 27, p VT28, or pVT29) and transforming the construct into E. coli. A total of 
about 2,000 individual clones were selected from each library for testing purposes 
for each set of scaffold candidates, 20 random clones were examined to determine the 
percentage of insert-bearing clones. All three had insert frequencies of at least 90%. 
Evaluation of Peptide Display Libraries In Yeast 

The amplified libraries from E coli were transferred into yVT12 veast cells 
(MATa, HMLa, HMRa, sst2A, mtal A::hisG, mfa2A::hu>G, ade2-l, leu2-3, lys2, 
ura3-l. STF3::GAL1-STH3::HIS3), derived from JRY5312 (Boyartchuk. V., Ashby, 
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M. et al., 1997 275:1796-1800). yVT12 cells containing the appropriate plasmid (or 
library) were plated onto selective media supplemented with 2% dextrose or 2% 
galactosc/2% raffmose. Following incubation at 30°C, yeast derived from a single 
colony (or, in the case of a library, from a patch of cells) were transferred into 
5 selective liquid media supplemented with the appropriate carbon source. These 
cultures were grown with shaking at 30°C until mid log phase. The yeast were 
pelleted, resuspended in PBS, and scanned on a FACStnrPLUS ( Rector, &i Dickinson 
San Jose CA) scanner with excitation at 488nm. Fluorescence emission was 
measured with a 515/40nm band pass filter. Cytometer settings were: FSC E00V, 

10 SSC 400V, FL1 470V, FSC threshold value 24. All scans were repeated in 

independently cultured cells in triplicate. Though the absolute fluorescence levels of 
different cells varied, the tluorescence appeared to be uniformly distributed 
throughout the cells, not concentrated in clumps or subcellular compartments. This 
suggested that the GFP-aptamer hybrid proteins were soluble in yeast. 

15 To determine which of the three sites within GFP can best accommodate 

peptides comprising 20 residues of diverse sequence, fluorescence scans on a flow 
cytometer were carried out. Mean fluorescence intensities and the fraction of cells in 
specific fluorescence intensity windows were determined for yeast cell populations 
containing the libraries (see Table 3 ) 1 he results suggested that two candidates 

20 (pVT27APT and pVT28APT) provided a suitable site for library expression using 
GFP as a scaffold, according to the method of scaffold design pursued in these 
experiments. The other scaffold-aptamer library (PVT29APT) had mean a 
fluorescence intensity that was close to the background level. Thus, of the sites we 
examined in GFP (apart from the N- and C-tcrmini), two were found to display a 

25 variety of peptide aptamers in a manner compatible with autotluorescence. One of 

these sites (corresponding to pVT27) is located within one of the smaller loops of the 
protein (Alal 55-Tle 161) I lowever, main chain atoms in this loop have the highest 
temperature factors of any backbone atoms in the structure, as high. as the solvent- 
exposed N-terminus. This suggests that the insertion site is more mobile than other 

30 loops and, as such, may not be an integral part of the structure. 
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The library species in pVT27APT and pVT28AP V each had a mean 
fluorescence intensity that was roughly 10% of the construct containing the linker 
sequence alone. A fluorescence window was set to determine whether pVT27APT 
and pVT28APT clones generally produced low fluorescence intensities, or whether 
there was a wide range of intensities. At an intensity cutoff ten-fold abovethe 
background (cells without GFP) where 95% of the control GFP-expressing yeast (with 
pVT21) were above threshold, nearly 15% of the pVT27APT- and pVT28APT- 
containing cells were also positive. This suggests that; (i) pVT27APT and 
pVT28APT clones encode proteins that are either expressed at lower levels than wild 
type GFP produced by pVT21, or are less fluorescent; and (ii) there is significant 
variability in fluorescence among the individual library clones. 

pVT27 was chosen as a scaffold candidate to build a large GFP-aptamer 
library. To facilitate this, an oligonucleotide coding for a biased random 15 amino 
acid aptamer (flanked by three constant amino acids on either end) was synthesized 
and cloned into pVT27 (as described above under preparation of Peptide Display 
Libraries). The resulting library contained 1.5xl0 6 members and was designated 
pVT27APT2. A proportion of yeast harboring pVT27APT2 GFP-aptamer clones did 
not fluoresce when grown under inducing conditions. Fig. 4. These dim yeast may 
have lacked fluorescence due to termination codons in the random aptamer, improper 
folding of the full-length GFP-aptamer protein, or for other reasons. Based on the 
biased random DNA sequence encoding the aptamer, 27% of the library members 
were expected to contain termination codons by chance, resulting in a truncated and 
non-fluorescent GFP protein. From the fluorescence intensity profiles, it was 
estimated that roughly 60% of the library sequences produced non-fluorescent 
proteins. The difference (60% - 27%) may reflect the proportion of incorrectly folded 
and/or unstable GFP proteins in the library. These approximate numbers were 
corroborated by DNA sequence analysis of individual GFP-aptamer clones 

To further explore the question of the folded state of GFP-aptamer molecules 
produced by the pVT27APT2 library, the fluorescence properties of 10 individual 
clones were examined in detail These yeast were obtained by collecting a 
subpopulation of the pVT27APT2 yeast library which was fluorescent at a level above 
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that of induced cells. The sorted yeast clones were grown under inducing conditions, 
and fluorescence emission at 5 1 5 nm was measured Wild type GFP protein has 
excitation and emission maxima at 395 run and 509 nm, respectively. P VT21 and its 
derivatives produce a red-shifted GFP variant which has an excitation maximum at 
490 nm but also emits at 509nm. Fluorescence analysis of the 10 clones with 
excitation at 488 nm revealed a broad distribution of mean fluorescent values Fig. 
5A. 

A Western blot of proteins extracted from yeast cells harboring these 10 clones 
was prepared to provide an independent estimate of GFP-aptamer levels in these cells. 
SDS-PAGE was carried out with the Laemmli Tris-buffer system. (Lacmmli, I J. 
Nature 1970 277:680-685) Gel transfer was performed using a Genie electrophoret.c 
blotter (Idea Scientific). Following blotting, the membrane was incubated 
successively with rabbit antisera containing polyclonal anti-GFP antibodies (Clontech. 
Palo Alto CA), and peroxidase conjugated anti-rabbit IgG (Santa Cruz Biotechnology, 
Santa Cruz CA); and the bands were visualized with the peroxidase substrates 
diamino benzadine and hydrogen peroxide. There was a rough correlation between 
expression and fluorescence levels. For example, clone B5 produced the least 
fluorescence of any of the 10 clones examined, more than 100 fold below the parental 
pVT27 construct. The protein level revealed by Western blot analysis was also the 
lowest of the 10 clones. Fig. 5B. 

The possibility of serious bias in the sequences of aptamers capable of displav 
by the pVT27 GFP scaffold was examined by sequence analysis of 53 independent 
clones from the pVT27APT2 library. Table 4. These clones were selected from the 
subset of P VT27APT2 sequences that generate fluorescent proteins by selection using 
the flow sorter. Analysis of the amino acid distribution of these aptamers revealed 
some statistically significant bias. Glycine, lysine, and threonine were over- 
represented, compared to their expected frequency of occurrence, while leucine and 
glutamate were under-represented. Glycine was one of the most dramatic outliers, 
and this may reflect a preference for small, flexible residues in protein loops. 
(Edwards, M., Stenberg, J. et al. Protein Eng. 1987 1:173-181) Indeed, 
overabundance of glycine at position 12 in the aptamer was the only statistically 
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significant difference (p<0.005) observed when the analysis was performed position 
by posit.on in the 15-residue aptamer sequence. However, it seems unlikely that there 
is a dramatic bias m the structural/chemical properties encompassed by the aptamer 
library in terms of charge or hydrophobicity, because no systematic preference for or 
avoidance of residues of specific chemical types was observed. 



AlVllfSU ACID 


EXPECTED # 


OBSERVED # 


OBS/EXP 


P 


Ala 


48.7 


46 


0 95 


0.68 


Arg 


64.9 


66 


1.09 


0.18 


Asn 


32.5 


34 


1.05 


0.75 


Asp 


32.5 


36 


111 


0.68 


Cys 


32.5 


28 


111 


0.68 


uln 


16.2 


] 5 


0.93 


0.87 


CjIU 


16.2 


28 


0.86 


o.o4 1 : 


Gly 


48.7 


92 


l .89 


j 

<0.00l 


T T ' 

His 


32.5 


_ *+ 


0.74 


0.38 


lie 


32.5 ! 4o 

i 


1. 23 


0.43 


Leu 


64.9 ; s 

xeT ! 


042 


<0.00l ; 

! 


Lys 


1 


2.03 


0.002 


Met 


16.2 | - 

i 


1. 67 


0.07 I ; 


Phe 


32.5 


25 


0.77 


I 

0.46 ' 


Pro 


48.7 


43 


0.88 


0.5" 


Set 


81.3 


66 


0.8t 


0.18 


Thr 


48.7 


69 


1. 42 


0.018 


Trp 


16.2 


27 


1.67 


0.071 


Tyr 


32.5 


20 


0.62 


0.097 


Val 


48 7 


52 


1. 07 


0.65 



Table 4: Analysis of amino acid composition of aptamer sequences 

among 53 randomly selected clones encoding "bright" GFP chimeras. 
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Preparation and Evaluation of Peptide Display Libraries in Mammalian Cells 

rhc above libraries and screening methods may be readily adapted for 
evaluation of mammalian cells, using materials and techniques that are familiar to 
those of skill in the art. 

Although a wide variety of genes encoding GFP are suitable for use in the 
methods described herein, a GFP-cncoding gene that was human codon-optimized (E- 
GFP, available from. e.e.. Clonter.h r^mlna qk/qq M n A077_i r, i c.^. ^Wt^i 
for use in experiments in which the peptide display libraries were expressed in 
representative mammalian cell lines. 

A suitable retroviral vector was constructed as follows. Retroviral vector 
pCLMFG (received from the laboratory of Dr. Index Verma at the Salk Institute) was 
digested with Hindlll, linearized with T4 DNA polymerase, and subsequently 
digested with Seal. A 2874 bp fragment containing the retroviral elements was 
isolated and cloned into a 1.8 kb PvuII, Sspl digested Biuescript fragment that 
contains the bacterial origin of replication and an ampicillin resistance gene 
(commercially available through Stratagene, Inc.)- This vector was designated 
pVT323. The Clontech vector containing E-GFP and pVT323 each were digested 
with Ncol and BamHI and religatcd. Plasmids containing the E-GFP and pVT323 
inserts in the correct orientation were isolated and designated as vector pVT324. 
Restriction sites just 3' of the E-GFP fragment were altered by cloning double 
stranded oligonucleotides (sense sequence 5' 

CGAGAATATTGGAAGCTTGGGCGGCCGCGGATCCAGTGAATGAGTGC - 3' ) 
into the Xhol and BamHI sites. This insertion also added stop codons in all three 
frames. This plasmid vector was designated pVT325. 

25 Following the strategy described above for the construction of yeast scaffold 

vector pVT27, a modified E-GFP gene containing an internal 6-codon insert encoding 
the Xhol, Hindlll and BamHI sites was placed under the control of a CMV promoter 
by cloning the gene into the Ncol and Bglll restriction sites of plasmid vector 
pVT325. This plasmid vector was designated pVT3 34. Next, a library of DNA 

JO fragments encoding random 1 5 amino acid sequences were inserted into the Xhol and 
Hamlll sites of the E-GFP gene of pVT325, using an identical cloning strategy as was 
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described above for the construction of the random peptide librarv in pVT27APT2 
The resulting library contained 7 X 10 7 members. 

The fluorescence properties of the pVT334 library bearing DNA inserts 
encoding the random 15 amino acid library was evaluated by analyzing a random 
sampling of cells from that library. Three separate human melanoma cell lines 
(HS294T P 16 lad, WM35 and 1552C) were infected with viral supernatant from the 
pVT334 library. A representative fluorescence scan from the eighth day post- 
infection is provided in Fig. 7. This scan demonstrates that the infection and 
subsequent expression of the 15 amino acid library variants can be monitored in 
human cells by this technique. Moreover, the data suggests that the GFP/library 
variant constructs are stable over time. As with the pVT27APT2 library, a dim 
population was present, and may either represent uninfected cells, or incorrectly 
folded, unstable or prematurely terminated constructs. 

A Western blot of proteins extracted from melanoma cells transformed with 
the pVT334-internal library construct was prepared using an aliquot of the cells used 
in FACS analysis described above (Fig. 7). Cells were harvested two and eight davs 
post infection by trypsinization. washed in phosphate buffered saline, and resuspended 
in IX gel loading buffer at a concentration of 10 7 cells/ml. Protein extracts were 
electrophoresed in a 10% Bis-Tris NuPage gel (Novex, San Diego, CA) and 
20 transferred to PVDG membrane (Millipore Corp , Bedford. MA). The membrane u . 
probed with polyclonal anti-GFP antibodies (Clontech, Palo Alto, CA) followed bv 
HRP-conjugated anti-Rabbit IgG (Santa Cruz Biotechnology, Santa Cruz, CA). 
Bands were visualized using ECL detection system (Amersham). Fig. 8. This data 
provides an independent estimate of GFP-aptamer levels. As can be observed in F^ 
25 8, shifts in molecular weight similar to those observed with pVT27APT2 in yeast 
were detected in protein extracted from melanoma cells expressing the internal 
library, i his data confirms that expression of random peptides was achieved. 
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EXAMPLE 2 

C onstruction of Constrained Amino- and Carboxy-tcrminal 
GFP-Aptamer Fusion Libraries 

A variety of experiments demonstrate that the N- and C-termini of GFP can be 
5 joined to foreign sequences without seriously compromising GFP activity (Cormaek 
BP, Valdivia R H , Fnlknw S. Gene 1996, 173:33-38; Yang TT, Cheng L, Kain SR, 
Nucleic Acids Res., 1996, 24: 4592-4593). These properties of GFP suggest that it is 
possible to transform GFP into a display scaffold for perturbagen libraries that involve 
insertions of library sequences near the N- and C-termini. To ensure that the library 

10 sequences are maximally constrained in conformation, and that the maximum number 
of library sequences can be displayed at high level, it is preferable to introduce a 
sequence at the N- or C- terminus that separates the library sequences from the protein 
termini. Two possible strategies to identify useful sequences can be employed. First, 
the terminal flanking sequence can be derived from DNA encoded by synthetic 

15 oligonucleotides; or, second, the terminal sequence can be derived from native 
proteins found within cells. 

In both cases, an expression vector containing a GFP coding sequence must be 
prepared in such a way that a library of perturbagen-encoding sequences can be 
introduced. This involves a modest amount of molecular genetic engineering. The 

20 same vector, if engineered as described below, can be used as the starting material for 
both strategies. This vector contains a restriction site suitable for appending the 
terminal sequence, be it native or synthetic DNA, and a restriction site or sites 
appropriate for insertion of the library sequences. For example, the vector pVT2 1 
may be engineered using methods known in the art to contain three restriction sites 

25 located either at the 5' end of the GFP coding sequence or 

at the 3' end of the GFP coding sequence: FcoRJ, Xhol, and BamHI. (Fig. 2). 

Library Construction 

A DNA fragment encoding a random 15 amino acid sequence is cloned 
separately into the regions encoding the N- and C-terminus of GFP in pVT2 1 . The 



10 



WO 99/24617 

PCT/US98/23778 



resulting plasmids are amplified in E. eoii and transformed into S. cerevisiae. 
franstormed yeast that retain maximal fluorescence (relative to yeast that express the 
GFP gene in pVT21) under inducing conditions are sorted from the rest of the 
population on a FACS machine. Those yeast with fluorescence intensities that are 
significantly greater than the mean fluorescence of the population (and that approach 
or exceed the mean fluorescence of yeast that express GFP in the pVT21 plasmid) are 
collected and plated for growth of single colonies. 

Yeast cells harboring plasmids that confer fluorescence are purified from 
individual yeast colonies and their inserts sequenced. To choose suitable N- or C- 
terminal fusion sequences that satisfy the requirements of the invention, several 
cnteria are considered. First, the terminal sequences must permit high-level 
expression and fluorescence of GFP molecules that include random peptide sequences 
positioned between the terminal sequence and the native GFP sequence. In addition, 
the ideal 15 amino acid extension sequence should preferably not be extremely 
1 5 charged or hydrophobic so as not to interact with cellular components. 

Five (or more) plasmids are selected on the basis of their amino acid sequence 
composition. Random aptamers are inserted into each of these five constructs 
between the terminal sequence addition and the body of GFP, and the resulting; 
libraries transformed into yeast. The transformed yeast are grown under inducing 
conditions and scanned using a FACS The plasmid which best accommodates 
random inserts while retaining fluorescence is chosen based on its mean and medi.n- 
fluorescence intensities compared to controls such as the background fluorescence of 
yeast and the mean fluorescence of pVT2 1 -containing yeast cells. This scaffold is 
used to construct a large-scale random aptamer library using methods known in the an 
25 (Ausubel et al., 1996) and as described in Example 1. 
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EXAMPLE 3 

C;FP Fusions Composed of N- or i -Terminal Fab Domains 
that Present Peptide Aptamcrs. 

Higher mammals can generate antibodies capable of binding specifically and 
tightly to almost any compound. As such, immunoglobulins (Igs) can be considered as 
ideal protein scaffolds for the display of short peptide aptamers. The variable domain 
of an Ig subunit consists of a beta-barrel together with three exposed loops that form 
hypervariable regions (HVRs) (Edmundson, A., Ely, K. et. al. 1975 Biochemistry 
14:3953-3961). These HVRs comprise the antigen binding sites and, depending on the 
class of Ig, can accommodate between 6 and 15 amino acids of random sequence. 

Recently, Igs have been engineered to produce minibodics (Pessi, A., Bianchi, 
E. et. al. 1993 Naure 362:37-369). A minibody is a 61 amino acid polypeptide 
consisting of three strands from each of the two beta sheets of the Fab variable 
domain of the mouse immunoglobulin, together with the HI and H2 hypervariable 
regions. HI and H2 can each display a random peptide sequence of 6 amino acids. 
Furthermore, it has been demonstrated (using phage display) that a minibody librarv 
can be used to isolate a minibody which binds tightly and specifically to human 
intcrlcukin-6 (Martin, F , Toniatti, C. et. al. 1994 EM BO Journal 13:5303-5309). 
These properties of a minibody suggest that it can be used in conjunction with GFP to 
produce an autofluorescent protein capable of presenting random peptides. 

Construction of Minibody-GFP Fusion Library 

Using methods known in the art (see Example 1 ), a minibody coding sequence 
as described in Pessi et al. (1993) is cloned separately into sites located at coding 
sequences for the N- and C-terminus of GFP in, e.g., pEGFP-C and pEGFP-N 
(Clontech Catalog 97/98, p. 114-1 15). These hybrid constructs arc tested to ensure 
that they maintain fluorescence in vivo using a flow sorter or similar device. As 
described in Martin et al. (1994), cloning sites for a library can be introduced into the 
modified minibody-GFP vector to permit introduction of random oligonucleotides 
coding for random 6 amino acid peptides into either one or both of the HVRs in the 
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minibody. After preliminary studies to confirm that the minibody-GFP fusion proteins 
are autotluorescent, this minibody-GFP scaffold is used to produce a large-scale 
library described in Example 1. 

EXAMPLE 4 

Use of GFP/Peptide Fusions in Genetic Screens/Selections in Human "Cells 



ic 
*se 



The peptide display scaffold of this invention can be used for geneti 
experiments in mammalian cells, including human cells. Conceptually, thes 
experiments are very similar to those carried out in yeast, but they involve certain 
technical differences that involve growth of the cells, details of the expression vector 
used to drive expression of the peptide scaffold, and transfer of DNA into the cells 
(e.g., PCT US97 145 14, Selection Systems for the identification of Genes Based on 
Functional Analysis). For the purposes of the invention described herein, we give a 
specific example of a mammalian expression vector. 

The expression library is constructed in the vector shown in Fig. 6. The vector 
15 is similar in design to that of Fig. 2. It is based on pEGFP-Cl (Clontech) which 

contains a pUCIO origin of replication, and a bacterial promotor upstream of the gene 
encoding kanamycin resistance; these allow selection and propagation in E. coli. The 
vector also contains signals for selection and maintenance in mammalian cells: an 
SV40 promotor that drives expression of a neomycin resistance gene followed by an 
SV40 polyadenylation signal and an SV40 origin of replication. The vector encodes a 
red-shifted GFP variant optimized for expression in mammalian cells linked to a 
multiple cloning site and polyadenylation signal. The EGFP sequence was modified 
as described in Example 1 to contain a KpnI/EcoRl/BamH 1 linker at codon position 
1 56' 157 (as m pVT27, Example 1 ). The modified EGFP sequence was cloned into the 
EGFP-C1 vector treated previously to remove the BamHl site in its polylinkcr ( by 
digestion with Bglll and BamHl and religation. thus forming a BgllEBamifl hybrid 
site in the multiple cloning site). Two "splint" oligonucleotides labeled ^antisense" 
were annealed to the randomer oligonucleotide ("sense") under conditions favoring 
formation of perfectly matched duplex (as in Example 1), and ligated into the 
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Kpnl/BamHl digested vector to generate a large population of in-framc, random 4S- 
mer oligonucleotide insert sequences for expression of random 15-mer peptide 
insertions in GFP in mammalian cells. 



The oligonucleotide sequences are: 
5 scnse: v C AOC OCT GO - (NNX)I5 - GGG TCC GCA G y 

antisensc: 3* CA TGG TCG CGA CCG 5* T CCC AGG CGT CCT AG 5' 

The above examples are provided to illustrate the invention but not to limit its 
scope. Other variants of the invention will be readily apparent to one of ordinary skill 
in the art and encompassed by the appended claims. All publications, patents, and 
10 patent applications cited herein are hereby incorporated by reference. 
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CLAIMS 

What is claimed is: 

1 1 . A nucleic acid sequence encoding a peptide display scaffold 

2 comprising: 

3 a) a first sequence encoding an autofluorescent protein; 

4 b) a site designed to allow a second sequence encoding an amino acid 

5 sequence to be inserted into the first sequence to encode a second protein capable of 

6 emitting light; 

7 wherein the peptide display scaffold is designed to display the amino acid 

8 sequence in a constrained conformation. 

1 2. The nucleic acid sequence of claim 1 wherein the autofluorescent 

2 protein is GFP. 

1 3. The nucleic acid sequence of claim 1 wherein the autofluorescent 

2 protein is green fluorescent protein from the jellyfish Aequorea victoria. 

1 4. A nucleic acid sequence encoding a peptide display scaffold 

2 comprising: 

3 a) a first sequence encoding an autofluorescent protein; 

4 b) a site designed to allow a second sequence encoding an amino acid 

5 sequence to be inserted into the first sequence to encode a second protein capable of 

6 emitting light; 

7 wherein the site is located in a region of the first sequence that corresponds to 

8 a solvent exposed region in the tertiary structure of the autofluorescent protein. 

1 5. The nucleic acid sequence of claim 4, wherein the site is located in a 

2 region of the first sequence that corresponds to a beta turn in the tertiary structure of 

3 the autofluorescent protein. 

1 6. The nucleic acid sequence of claim 4 wherein the autofluorescent 

2 protein is GFP. 

1 7. The nucleic acid sequence of claim 6 wherein the autofluorescent 

2 protein is green fluorescent protein from the jellyfish Aequorea victoria. 
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1 8. The nucleic acid sequence of claim 6, wherein the site is located in the 

2 region of the first sequence encoding the Ala 155 to He 161 region of the 

3 autofluorescent protein. 

1 9. The nucleic acid sequence of claim 6, wherein the site is located in the 

2 region of the first sequence encoding the Lys 162 to Gin 183 region of the 

3 autofluorescent protein. 

1 10. The nucleic acid sequence of claim 6, wherein the site is located in the 

2 region of the first sequence encoding the Gin 1 84 to Ser 205 region of the 

3 autofluorescent protein. 

1 1 L A peptide displayed on a scaffold comprising: 

2 a) a scaffold amino acid sequence comprising an autofluorescent 

3 protein, and 

4 b) the peptide inserted into the scaffold amino acid sequence, 

5 wherein the molecular combination of the peptide displayed on the scaffold is 

6 capable of emitting light and wherein the peptide is displayed in a constrained 

7 conformation. 

1 12. The peptide displayed on a scaffold of claim 1 1 wherein the 

2 autofluorescent protein is GFP. 

1 13. The peptide displayed on a scaffold of claim 1 1 wherein the 

2 autofluorescent protein is green fluorescent protein from the jellyfish Aequorea 

3 victoria. 

1 14. A peptide displayed on a scaffold comprising: 

2 a) a scaffold amino acid sequence comprising an autofluorescent 

3 protein, and 

4 b) the peptide inserted into the scaffold amino acid sequence, 

5 wherein the molecular combination of the peptide displayed on the scaffold is 

6 capable of emitting light and wherein the peptide is inserted into a solvent exposed 

7 region in the tertiary structure of the scaffold amino acid sequence. 

1 15. The nucleic acid sequence of claim 14, wherein the site is located in a 

2 region of the scaffold amino acid sequence that corresponds to a beta rum in the 

3 tertiary structure of the autofluorescent protein. 
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1 16. The nucleic acid sequence of claim 14 wherein the autofluorescent 

2 protein is GFP. 

1 17. The nucleic acid sequence of claim 16 wherein the autofluorescent 

2 protein is green fluorescent protein from the jellyfish Aequorea victoria. 

1 18. The nucleic acid sequence of claim 1 6, wherein the peptide is inserted 

2 in the region of the first sequence encoding the Ala 155 to fie 161 region of the 

3 autofluorescent protein. 

1 1 9. The nucleic acid sequence of claim 1 6, wherein the peptide is inserted 

2 in the region of the first sequence encoding the Lys 162 to Gin 183 region of the 

3 autofluorescent protein. 

1 20. The nucleic acid sequence of claim 16, wherein the peptide is inserted 

2 in the region of the first sequence encoding the Gin 184 to Ser 205 region of the 

3 autofluorescent protein. 

1 2 1 . A method for engineering a nucleic acid sequence encoding a peptide 

2 display scaffold, comprising: 

3 a) inserting a linker sequence comprising a site between the ends of a 

4 first sequence encoding a first molecule capable of emitting light to generate a second 

5 sequence encoding a scaffold candidate, 

6 b) quantitatively determining a property of the light emitted by the 

7 scaffold candidate, 

c ) selecting the candidate if the property quantitatively-determined in 

9 step b) meets a pre-determined criterion, 

10 d ) inserting a library sequence at the site to generate a third sequence, 

1 1 e) quantitatively determining a property of the light emitted by the 

12 molecule encoded by the third sequence, and 

1 3 0 re-selecting the candidate if the property quantitatively-determined 

14 in step e) meets a pre-dctermined criterion. 

1 22. The method of claim 21, wherein the quantitative determination of step 

2 b) or step e) is made using a flow sorter device. 

1 23. The method of claim 21, wherein the quantitatively-determined 

2 property is the intensity of the emitted light. 
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1 24. A peptide display library, comprising a plurality of expression vectors, 

2 wherein the vectors comprise: 

3 a) a first nucleic acid sequence encoding an autofluorescent protein, 

4 and 

5 b) a second nucleic acid sequence encoding an amino acid sequence, 

6 wherein at least one of the plurality of vectors encodes a molecule capable of 

7 emitting light in which the amino acid sequence is. displayed in a constrained 

8 conformation when the vector is expressed in a host cell. 

1 25. A peptide display library, comprising a plurality of expression vectors, 

2 wherein the vectors comprise: 

3 a ) a first nucleic acid sequence encoding an autofluorescent protein; 

4 and 

$ b) a second nucleic acid sequence encoding an amino acid sequence 

6 inserted into a site located in a region of the first sequence corresponding to a solvent 

7 exposed region in the tertiary structure of the autofluorescent protein; 

8 wherein at least one of the plurality of expression vectors encodes a molecule 

9 capable of emitting light. 

1 26. The peptide display library of claim 25, wherein the second sequence is 

2 inserted into a site in the first sequence corresponding to a beta turn in the tertiary 

3 structure of the first molecule. 

1 27. A method of selecting a subset of a peptide display library, comprising: 

2 a) introducing a library into a plurality of host cells, wherein the library 

3 comprises a plurality of expression vectors, wherein the expression vectors further 

4 comprise: 

^ 1 ) a first nucleic acid sequence encoding an autofluorescent 

6 protein, and 

7 2) a second nucleic acid sequence encoding an amino acid 

8 sequence, wherein at least one of the plurality of vectors encodes a molecule capable 

9 of emitting light in which the amino acid sequence is displayed in a constrained 
10 conformation when the vector is expressed in a host cell; 
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b) quantitatively determining a property of the light emitted by the 
second molecules expressed in the plurality of host cells, and 

c) selecting from the plurality of host cells a subset of cells, wherein 
for each of the selected cells, the property quantitatively determined in step b) meets a 
pre-determined criterion. 

28. The method of claim 27, wherein the quantitative determination of step 
b) is accomplished using a flow-sorter device. 

29. The method of claim 27, wherein the quantitatively-determined 
property is the intensity of the emitted light. 

30. The method of claim 27, wherein the host cells are bacterial cells. 

3 1 . The method of claim 27, wherein the host cells are archaebacterial 

cells. 

32. The method of claim 27, wherein the host cells are fungal cells. 

33. The method of claim 27, wherein the host cells are mammalian cells. 

34. The method of claim 27, wherein the host cells are insect cells. 

35. The method of claim 27, wherein the host cells are plant cells. 

36. A method of selecting a subset of a peptide display library, comprising 

a) introducinga library into a plurality of host cells, wherein the libr iry 
compnses a plurality of expression vectors, wherein the expression vectors further 
comprise: 

1) a first nucleic acid sequence encoding a autofluorescent 

protein, and 

2 ) a second nucleic acid sequence encoding an amino acid 
sequence inserted into a site located in a region of the first sequence corresponding to 
a solvent exposed region in the tertiary structure of the autofluorescent protein; 

b) quantitatively determining a property of the light emitted by the 
second molecules expressed in the plurality of host ceils, and 

c) selecting from the plurality of host cells a subset of cells, wherein 
for each of the selected cells, the property quantitatively determined in step b) meets a 
pre-determined criterion. 
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1 37. The method of claim 36, wherein the quantitative determination of step 

2 b) is accomplished using a flow-sorter device. 

1 38. The method of claim 36, wherein the quantitatively-determined 

2 property is the intensity of the emitted light. 

1 39. A method of identifying a peptide display library sequence encoding a 

2 peptide of interest, comprising: 

3 a) introducing a library into a plurality of host cells, wherein the library 

4 comprises a plurality of expression vectors, wherein the expression vectors further 

5 comprise: 

6 1) a first nucleic acid sequence encoding an autofluorescent 

7 protein, and 

8 2) a second nucleic acid sequence encoding a peptide, wherein 

9 at least one of the plurality of vectors encodes a second molecule capable of emitting 

1 0 light in which the peptide is displayed in a constrained conformation when the vector 

1 1 is expressed in a host cell; 

12 b) selecting one or more host cells in which expression of the peptide 

13 sequence confers a phenotypic variation upon the host cell, and 

14 c) recovering the peptide sequence from the selected host cell. 

1 40. A method of identifying a peptide display library sequence encoding a 

2 peptide of interest, comprising: 

3 a) introducing a library into a plurality of host cells, wherein the library 

4 comprises a plurality of expression vectors, wherein the expression vectors further 

5 comprise: 

6 1) a first nucleic acid sequence encoding an autofluorescent 

7 protein, and 

8 2) a second nucleic acid sequence encoding a peptide inserted 

9 into a site located in a region of the first sequence corresponding to a solvent exposed 

10 region in the tertiary structure of the autofluorescent protein, wherein at least one of 

1 1 the plurality of vectors encodes a second molecule capable of emitting light when the 

12 vector is expressed in a host cell; 



WO 99/24617 



PCT/US98/23778 



42 

1 3 b ) selecting one or more host cells in which expression of the peptide 

14 sequence confers a phenotypic variation upon the host cell, and 

1 5 c) recovering the peptide sequence from the selected host cell. 

1 41. An expression vector comprising: 

2 a) a first nucleic acid sequence encoding an autofluorescent protein; 

3 and 

4 b) an insertion site designed to allow a second nucleic acid sequence 

5 encoding an amino acid sequence to be inserted into the first sequence to encode a 

6 second protein capable of emitting light wherein the ammo aid sequence is displayed 

7 in a constrained conformation as part of the second protein. 

1 42. The expression vector of claim 4 1 wherein the autofluorescent protein 

2 is GFP. 

1 43. The expression vector of claim 4 1 wherein the autofluorescent protein 

is green fluorescent protein from the jellyfish Aequorea victoria. 
1 44. An expression vector comprising: 

a) a first nucleic acid sequence encoding an autofluorescent protein; 

3 and 

4 b) an insertion site designed to allow a second nucleic acid sequence 

5 encoding an amino acid sequence to be inserted into the first sequence to encode a 

6 second protein capable of emitting light; 

7 wherein the site is located in a region of the first sequence that corresponds to 

8 a solvent exposed region in the tertiary structure of the autofluorescent protein. 

1 45. The expression vector of claim 44, wherein the site is located in a 

2 region of the first sequence that corresponds to a beta turn in the tertiary structure of 

3 the autofluorescent protein. 

1 46. The expression vector of claim 44, wherein the autofluorescent protein 

2 is GFP. 

1 47. The expression vector of claim 44, wherein the autofluorescent protein 

2 is green fluorescent protein from the jellyfish Aequorea victoria. 
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1 48. The expression vector of claim 44, wherein the site is located in the 

2 region of the first sequence encoding the Ala 155 to He 161 region of the 

3 autofluorescent protein. 

1 49. The expression vector of claim 44, wherein the site is located in the 

2 region of the first sequence encoding the Lyn 162 to Gin 183 region of the 

3 autofluorescent protein. 

1 50. The expression vector of claim <1-1, wherein the site i:> located in me 

2 region of the first sequence encoding the Gin 1 84 to Ser 205 region of the 

3 autofluorescent protein. 
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